Corpus: mon_wikipedia_2018_30K

Other corpora

4.3.1.5 Number of Word-N-grams at Sentence Beginnings

Number of word-N-grams for N=1...5 for the first K sentences


Zipf's diagram for sentence beginnings


Gnuplot diagram

K # of words # of bigrams # of trigrams # of 4-grams # of 5-grams
100 74 92 96 96 97
1000 787 959 989 994 997
10000 4345 8593 9678 9915 9981
100000 10144 24109 28829 29774 29926
1000000 10144 24109 28829 29774 29926
2836 msec needed at 2024-03-29 02:15